biplotEZ

User-friendly biplots in R



Centre for Multi-Dimensional Data Visualisation (MuViSU)
muvisu@sun.ac.za



SASA 2024

Heading

Heading 2

What are biplots?

  • The biplot is a powerful and very useful data visualisation tool.

  • Biplots make information in a table of data become transparent, revealing the main structures in the data in a methodical way, for example patterns of correlations between variables or similarities between the observations.

  • A biplot is a generalisation of a two-dimensional scatter diagram of data that exists in a higher dimensional space, where information on both samples and variables can be displayed graphically.

  • There are different types of biplots that are based on various multivariate data analysus techniques.

Flow of functions in biplotEZ

Main Function

Types of Biplot (1, 2, and 3 dimensions)

Operations

Aesthetic Functions

Plotting

biplot()

PCA() CVA() PCO() CA()

prediction() interpolate() translate() density() fit.measures() classify() alpha.bags() ellipses() rotate() reflect() zoom() regress() splines()

samples() axes() newsamples() newaxes()

plot()

First step to create a biplot

biplot(data=iris, 
       group.aes = iris[,5],
       Title="My first biplot")
# Object of class biplot, based on 150 samples and 5 variables.
# 4 numeric variables.
# 1 categorical variable.
Argument Description
data a dataframe or matrix containing all variables the user wants to analyse.
classes a vector identifying class membership. Required for CVA biplots
group.aes Variable from the data to be used as a grouping variable.
center a logical value indicating whether data should be column centered, with default TRUE.
scaled a logical value indicating whether data should be standardised to unit column variances, with default FALSE.
Title Title of the biplot to be rendered.

Type of biplot: PCA

PCA()
Argument Description
bp Object of class biplot.
dim.biplot Dimension of the biplot. Only values 1, 2 and 3 are accepted, with default 2.
e.vects Which eigenvectors (principal components) to extract, with default 1:dim.biplot.
group.aes If not specified in biplot()
show.class.means T or F: Indicating whether group means should be plotted in the biplot.
correlation.biplot T or F: Indicating whether distances or correlations between the variables are optimally approximated.

Construction of PCA biplot

  • Consider a data matrix \({\bf{X}}^*\) of size \(n \times p\).
  • Using the Iris data as an example, there are \(n=150\) observations measured across \(p=4\) variables.
tibble(iris)
# # A tibble: 150 × 5
#    Sepal.Length Sepal.Width Petal.Length
#           <dbl>       <dbl>        <dbl>
#  1          5.1         3.5          1.4
#  2          4.9         3            1.4
#  3          4.7         3.2          1.3
#  4          4.6         3.1          1.5
#  5          5           3.6          1.4
#  6          5.4         3.9          1.7
#  7          4.6         3.4          1.4
#  8          5           3.4          1.5
#  9          4.4         2.9          1.4
# 10          4.9         3.1          1.5
# # ℹ 140 more rows
# # ℹ 2 more variables: Petal.Width <dbl>,
# #   Species <fct>
  • To produce a biplot, we need to optimally approximate \({\bf{X}} = ({\bf{I}}_n - \frac{1}{n}{\bf{11}}'){\bf{X}}^*\).
  • We want to minimise \(min || {\hat{\bf{X}}} - {\bf{X}}||^2\).
  • The best approximation that minimises the least squares criterion is the \(r\)-dimensional Eckart-Young approximation given by \({\bf{\hat{X}}}_{[r]} = {\bf{U}} {\bf{D}}_{[r]} {\bf{V}}'\)

Representing samples

A standard result when \(r = 2\) is that the row vectors of \({\bf{\hat{X}}}_{[2]}\) are the orthogonal projects of the corresponding row vectors of \({\bf{X}}\) onto the column space of \({\bf{V}}_2\).

These projections are also known as the first two principal components.

Representing variables

The columns of \({\bf{X}}\) are approximated by the first two rows of \({\bf{V}}\), which now represent the axes for each variable.

The arrows representing the variables in the data can be calibrated to display marker points analogous to ordinary scatterplots.

PCA biplot

biplot(data=iris, 
       group.aes = iris[,5],
       Title="My first biplot") |> PCA() |> plot()

Aesthetics: samples()

Change the colour, plotting character and character expansion of the samples.

biplot(iris, group.aes = iris[,5]) |> 
  PCA() |> 
  samples(col = c("orange","purple","gold"), pch = c(15,1,17), cex = 1.2,opacity=0.6) |> 
  plot()

Notice that aesthetics in samples are applied to group.aes argument specified. Here there are three groups.

Aesthetics: samples()

Select certain groups, and add labels to the samples

biplot(iris, group.aes = iris[,5]) |> 
  PCA() |> 
  samples(which=c(1,2), col = c("orange","purple"),label=TRUE) |> 
  plot()

Aesthetics: samples()

Other arguments

Argument Description
label.col Colour of labels
label.cex Text expansion of the labels
label.side Side at which the label of the plotted point appears - “bottom” (default), “top”, “left”, “right”
label.offset Offset of the label from the plotted point
connected T or F: whether samples are connected
connect.col Colour of the connecting line
connect.lty Line type of the connecting line
connect.lwd Line width of the connecting line

Aesthetics: axes()

Change the colour and line width of the axes

biplot(iris[,1:4]) |> PCA() |> samples(col="grey",opacity=0.5) |>
  axes(col = "rosybrown",label.dir = "Orthog",lwd=2) |> plot()

Aesthetics: axes()

Show the first two axes with vector representation and unit circle

biplot(iris[,1:4]) |> PCA() |> samples(col="grey",opacity=0.5) |>
  axes(which=1:2,col = "rosybrown",vectors =TRUE,unit.circle = TRUE) |> plot()

Aesthetics: axes()

Other arguments

Axis labels
ax.names
label.dir
label.col
label.cex
label.line
label.offset

Ticks
ticks
tick.size
tick.label
tick.label.side
tick.label.col
Prediction
predict.col
predict.lwd
predict.lty

Orthogonal
orthogx
orthogy

Prediction of samples

prediction()

out <- biplot(iris[,1:4],group.aes=iris[,5]) |> PCA() |> 
  samples(col=c("orange","purple","gold"),opacity=0.5) |>
  prediction(predict.samples = c(1:2,51:52,101:102) )|>
  axes(predict.col = "red",predict.lwd = 1.5,predict.lty = 2) |> plot()

Prediction of samples

Predict only on the variable Sepal.Length: use the which argument.

biplot(iris[,1:4],group.aes=iris[,5]) |> PCA() |> 
  samples(col=c("orange","purple","gold"),opacity=0.5) |>
  prediction(predict.samples = c(1:2,51:52,101:102),which="Sepal.Length")|>
  axes(predict.col = "red",predict.lwd = 1.5,predict.lty = 2) |> plot()

Prediction of group means

biplot(iris[,1:4],group.aes=iris[,5]) |> PCA(show.class.means = TRUE) |> 
  samples(col=c("orange","purple","gold"),opacity=0.5) |>
  prediction(predict.means = TRUE)|>
  axes(predict.col = "red",predict.lwd = 1.5,predict.lty = 2) |> plot()

Predictions

summary(out)
# Object of class biplot, based on 150 samples and 4 variables.
# 4 numeric variables.
# 
# Sample predictions
#     Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1       5.083039    3.517414     1.403214   0.2135317
# 2       4.746262    3.157500     1.463562   0.2402459
# 51      6.757521    3.449014     4.739884   1.6079559
# 52      6.389336    3.210952     4.501645   1.5094058
# 101     6.751606    2.836199     5.928106   2.1069758
# 102     5.977297    2.517932     5.070066   1.7497923

Interpolation of samples

biplot(iris[1:100,]) |> PCA() |> 
  interpolate (newdata =iris[101:150,]) |> 
  newsamples(col="red") |> plot()

Interpolation of axes

biplot(iris[,1:3]) |> PCA() |> 
    interpolate(newdata = NULL, newvariable = iris[,4]) |> 
    newaxes(X.new.names = "Petal.Width") |> plot()

Translation

Automatically or manually translate the axes away from the center of the plot.

biplot(iris)|> 
      PCA(group.aes=iris[,5]) |> 
      translate_axes(swop=TRUE,delta =0.2)|>
      plot(exp.factor=3)

Density plots

On the first group ::: {.cell layout-align=“center”}

biplot(iris[,1:4],group.aes = iris[,5]) |> PCA() |> 
  density2D(which=1,col=c("white","purple","cyan","blue")) |> plot()

:::

Density plots

On the second group, and adding contours ::: {.cell layout-align=“center”}

biplot(iris[,1:4],group.aes = iris[,5]) |> PCA() |> 
  density2D(which=2,col=c("white","purple","cyan","blue"),contours=TRUE) |> plot()

:::

Density plots

On the third group, and changing the colour of the contours. ::: {.cell layout-align=“center”}

biplot(iris[,1:4],group.aes = iris[,5]) |> PCA() |> 
  density2D(which=3,col=c("white","purple","cyan","blue"),contours = TRUE,contour.col = "grey") |> plot()

:::

Fit measures

out2 <- biplot(iris[,1:4],group.aes = iris[,5]) |> PCA() |> fit.measures()
summary(out2)
# Object of class biplot, based on 150 samples and 4 variables.
# 4 numeric variables.
# 
# Quality of fit in 2 dimension(s) = 97.8% 
# Adequacy of variables in 2 dimension(s):
# Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
#    0.5617091    0.5402798    0.7639426    0.1340685 
# Axis predictivity in 2 dimension(s):
# Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
#    0.9579017    0.8400028    0.9980931    0.9365937 
# Sample predictivity in 2 dimension(s):
#         1         2         3         4         5         6         7         8 
# 0.9998927 0.9927400 0.9999141 0.9991226 0.9984312 0.9949770 0.9914313 0.9996346 
#         9        10        11        12        13        14        15        16 
# 0.9998677 0.9941340 0.9991205 0.9949153 0.9945491 0.9996034 0.9942676 0.9897890 
#        17        18        19        20        21        22        23        24 
# 0.9937752 0.9990534 0.9972926 0.9928624 0.9896250 0.9932656 0.9918132 0.9955885 
#        25        26        27        28        29        30        31        32 
# 0.9812917 0.9897303 0.9979903 0.9990514 0.9963870 0.9975607 0.9985741 0.9876345 
#        33        34        35        36        37        38        39        40 
# 0.9833383 0.9957412 0.9970200 0.9935405 0.9859750 0.9953399 0.9994047 0.9990244 
#        41        42        43        44        45        46        47        48 
# 0.9980903 0.9756895 0.9953372 0.9830035 0.9763861 0.9959863 0.9905695 0.9987006 
#        49        50        51        52        53        54        55        56 
# 0.9996383 0.9987482 0.9275369 0.9996655 0.9544488 0.9460515 0.9172857 0.9061058 
#        57        58        59        60        61        62        63        64 
# 0.9727694 0.9996996 0.8677939 0.8686502 0.9613130 0.9328852 0.4345132 0.9679973 
#        65        66        67        68        69        70        71        72 
# 0.7995848 0.9083037 0.7968614 0.5835260 0.7900027 0.8575646 0.8524748 0.6615410 
#        73        74        75        76        77        78        79        80 
# 0.9367709 0.8661203 0.8350955 0.8929908 0.8702600 0.9873164 0.9969031 0.6815512 
#        81        82        83        84        85        86        87        88 
# 0.8937189 0.8409681 0.7829405 0.9848354 0.6901625 0.8073582 0.9666041 0.6665514 
#        89        90        91        92        93        94        95        96 
# 0.6993846 0.9909923 0.9008345 0.9710941 0.8037223 0.9913632 0.9744493 0.7089660 
#        97        98        99       100       101       102       103       104 
# 0.9071738 0.9064541 0.9625371 0.9872279 0.9171603 0.9636413 0.9976224 0.9829885 
#       105       106       107       108       109       110       111       112 
# 0.9854704 0.9888092 0.8464463 0.9729353 0.9771293 0.9794313 0.9746239 0.9977302 
#       113       114       115       116       117       118       119       120 
# 0.9941859 0.9605563 0.8476794 0.9289985 0.9929982 0.9916850 0.9818957 0.9493751 
#       121       122       123       124       125       126       127       128 
# 0.9865358 0.8716778 0.9728177 0.9846364 0.9840890 0.9861783 0.9854516 0.9691512 
#       129       130       131       132       133       134       135       136 
# 0.9942007 0.9585884 0.9705389 0.9937852 0.9874192 0.9723192 0.9230503 0.9794405 
#       137       138       139       140       141       142       143       144 
# 0.8947527 0.9797055 0.9458421 0.9902488 0.9674660 0.9350646 0.9636413 0.9867931 
#       145       146       147       148       149       150 
# 0.9500265 0.9470544 0.9688318 0.9886543 0.8735433 0.9281727